High Breakdown Point Estimation in Regression

نویسنده

  • T. Jurczyk
چکیده

In robust regression theory the estimators, which can “resist” contamination of nearly fifty percent of the data, due to the fact that they are highly important in practice, were intensively studied. In this paper we describe three methods with high breakdown point: the least trimmed squares (LTS), the least median of squares (LMS) and their generalization the least weighted squares (LWS) estimator1. Instead of showing how powerful these procedures can be, we are especially interested in potential “problems” and situations when the methods can behave rather strange. These are illustrated by various data examples. We want to show that the high breakdown point regression should be performed with caution. Introduction During few decades before 1984 there was a great effort in searching for some multivariate regression estimators which will have breakdown point of nearly 50% (the exact definition of the breakdown point is mentioned below). It was due to the belief that such estimators can give a hint which of the estimates are near to the “true” model (see Rousseeuw and Leroy [1987]). The first proposal of such method was based on an idea by Hampel [1975] and presented by Rousseeuw [1984]. This method is called the least median of squares. In the same paper (Rousseeuw [1984]) another important estimator – the least trimmed squares was also introduced. Before we give the exact definitions, let us set up some notations. We consider the linear regression model Yi = X ′ iβ 0 + ei = p ∑ j=1 Xijβ 0 j + ei, i = 1, 2, ..., n. For any β ∈ R ri(β) = Yi −X ′ iβ denotes the i-th residual and r (j)(β) the j-th order statistic among the squared residuals. Definition 1 Let n/2 ≤ h ≤ n, then β̂ = argmin β∈R r (h)(β) and β̂ (LTS,n,h) = argmin β∈R h ∑ i=1 r (i)(β) are called the least median of squares (LMS) and the least trimmed squares (LTS) estimator, respectively. We are going to define also estimator proposed by Vı́̌sek [2001]. This estimator is also based upon ordered squared residuals but they are weighted in addition. Definition 2 Let for any n ∈ N 1 = w1 ≥ w2 ≥ .... ≥ wn ≥ 0 be some weights. Then β̂ = argmin β∈R n ∑ i=1 wir 2 (i)(β) is called the least weighted squares (LWS) estimator. Notice please that the single weight isn’t related directly to some observation (don’t confuse it with another regression method the weighted least squares). The estimator itself assigns the weights implicitly to the observations (so as in LTS case). From both definitions we can clearly see that the LTS are special case of the LWS with wi = I{i ≤ h} (I denotes indicator). The ordinary least squares (OLS) are the special case of the LWS as well. (If we wanted, we could define the LWS even more generally without the restriction on monotonicity of the weights, then also the LMS would be the special case of the LWS. But we use only the LWS in the form of definition 2 in what follows). 1LWS estimator has breakdown point in dependance on the choice of its weights. 94 WDS'08 Proceedings of Contributed Papers, Part I, 94–99, 2008. ISBN 978-80-7378-065-4 © MATFYZPRESS

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robust Regression

1. Introduction One of the most important statistical tools is a linear regression analysis for many fields. Nearly all regression analysis relies on the method of least squares for estimation of the parameters in the model. A problem that we often encountered in the application of regression is the presence of an outlier or outliers in the data. Outliers can be generated by from a simple opera...

متن کامل

EFFICIENT ROBUST ESTIMATION OF NONLINEAR REGRESSION PARAMETERS by Arnold

The least median of squares estimator (Rousseeuw, 1984)1 of linear regression parameters is a high breakdown estimator, meaning that, unlike the least squares estimator, it performs reasonably well when up to 50% outliers are present in a data set. Unfortunately, it lacks efficiency under normal errors. This disadvantage can be overcome by using the least median of squares estimator as a starti...

متن کامل

The Conditional Breakdown Properties of Robust Local Polynomial Estimators

Nonparametric regression techniques provide an e ective way of identifying and examining structure in regression data The standard approaches to nonparametric regression such as local polynomial and smoothing spline estimators are sensitive to unusual observations and alternatives designed to be resistant to such observations have been proposed as a solution Unfortunately there has been little ...

متن کامل

SUGI 27: Robust Regression and Outlier Detection with the ROBUSTREG Procedure

Robust regression is an important tool for analyzing data that are contaminated with outliers. It can be used to detect outliers and to provide resistant (stable) results in the presence of outliers. This paper introduces the ROBUSTREG procedure, which is experimental in SAS/STAT Version 9. The ROBUSTREG procedure implements the most commonly used robust regression techniques. These include M ...

متن کامل

Robust estimation of the SUR model

This paper proposes robust regression to solve the problem of outliers in seemingly unrelated regression (SUR) models. The authors present an adaptation of S-estimators to SUR models. S-estimators are robust, with high breakdown point, and are much more efficient than other robust regression estimators commonly used in practice. Furthermore, modifications to Ruppert’s algorithm allow a fast eva...

متن کامل

Cluster-Based Bounded Influence Regression

A regression methodology is introduced that obtains competitive, robust, efficient, high breakdown regression parameter estimates as well as providing an informative summary regarding possible multiple outlier structure. The proposed method blends a cluster analysis phase with a controlled bounded influence regression phase, thereby referred to as cluster-based bounded influence regression, or ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008